Stochastic Variance Reduction Methods for Policy Evaluation
نویسندگان
چکیده
Policy evaluation is a crucial step in many reinforcement-learning procedures, which estimates a value function that predicts states’ longterm value under a given policy. In this paper, we focus on policy evaluation with linear function approximation over a fixed dataset. We first transform the empirical policy evaluation problem into a (quadratic) convex-concave saddle point problem, and then present a primal-dual batch gradient method, as well as two stochastic variance reduction methods for solving the problem. These algorithms scale linearly in both sample size and feature dimension. Moreover, they achieve linear convergence even when the saddle-point problem has only strong concavity in the dual variables but no strong convexity in the primal variables. Numerical experiments on benchmark problems demonstrate the effectiveness of our methods.
منابع مشابه
Stochastic Variance Reduction for Policy Gradient Estimation
Recent advances in policy gradient methods and deep learning have demonstrated their applicability for complex reinforcement learning problems. However, the variance of the performance gradient estimates obtained from the simulation is often excessive, leading to poor sample efficiency. In this paper, we apply the stochastic variance reduced gradient descent (SVRG) technique [1] to model-free p...
متن کاملPerformance Evaluation and Policy Selection in Multiclass Networks
This paper concerns modelling and policy synthesis for regulation of multiclass queueing networks. A 2-parameter network model is introduced to allow independent modelling of variability and mean processing-rates, while maintaining simplicity of the model. Policy synthesis is based on consideration of more tractable workload models, and then translating a policy from this abstraction to the dis...
متن کاملVariance Reduction for Policy Gradient with Action-Dependent Factorized Baselines
Policy gradient methods have enjoyed great success in deep reinforcement learning but suffer from high variance of gradient estimates. The high variance problem is particularly exasperated in problems with long horizons or high-dimensional action spaces. To mitigate this issue, we derive a bias-free action-dependent baseline for variance reduction which fully exploits the structural form of the...
متن کاملDeep Reinforcement Learning
In reinforcement learning (RL), stochastic environments can make learning a policy difficult due to high degrees of variance. As such, variance reduction methods have been investigated in other works, such as advantage estimation and controlvariates estimation. Here, we propose to learn a separate reward estimator to train the value function, to help reduce variance caused by a noisy reward sig...
متن کاملIntegrated Variance Reduction Strategies for Simulation
We develop strategies for integrated use of certain well-known variance reduction techniques to estimate a mean response in a finite-horizon simulation experiment. The building blocks for these integrated variance reduction strategies are the techniques of conditional expectation, correlation induction (including antithetic variates and Latin hypercube sampling), and control variates; and all p...
متن کامل